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0 Method and apparatus for congestion control In a data network. 



0 A method of controlling congestion in a virtual 
circuit packet network. An initial packet buffer is 
assigned to each virtual circuit at each node into 
which incoming packets are stored and later re- 
moved for forward routing. If a larger buffer Is de- 
sired for a virtual circuit to service a larger amount of 
data, then additional buffer space is dynamically 

3 allocated selectively to the virtual circuit on demand 
if each node has sufficient unallocated buffer space 
^9 to fill the request- In one embodiment the criterion 
JJ* for dynamic aflocatfon Is based on the amount of 
data buffered at the data source. In alternative em- 
O boolments, the criteria for dynamic allocation may 
2 be further based on the amount of data buffered at 
~ each node for a virtual circuit and the total amount of 
O free buffer space at each node of a virtual droit 
Signaling protocols are di sclos e d whereby data sour- 
111 ces and virtual circuit nodes maintain consistent 
Information describing the buffer allocations at all 
times. 
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METHOD AND APPARATUS FOR CONGESTION CONTROL IN A DATA NETWORK 



Technical Raid 

The present invention relates to data networks 
in general and more particularly to protocols, meth- 
ods and apparatus that improve the flow of in- 
formation within such networks. 

Background of the Invention 

Packet-switched networks for the transport of 
digital data are well known in the prior art Typi- 
cally, data are tr an smitted from a host connecting 
to a network through a series of network Qnks and 
switches to a receiving host Messages from the 
transmitting host are divided into packets that are 
transmitted through the network and reassembled 
at the receiving host In virtual circuit networks, 
which are the subject of the present invention, all 
data packets transmitted during a single session 
between two hosts follow the same physical net- 
work path. 

Owing to the random nature of data traffic, data 
may arrive at a switching node of the network at an 
instantaneous rate greater than the transmission 
speed of the outgoing Bnk, and data from some 
virtual circuits may have to be buffered until they 
can be transmitted. Various queueing disciplines 
are known in the prior art Early data networks 
typically used some form of firsHn-first-out (FIFO) 
queueing service. In FIFO service, data packets 
arriving from different virtual circuits are put into a 
single buffer and transmitted over the output Dnk in 
the same order in which they arrived at the buffer. 
More recently, some data networks have used 
queueing cisciplines of round robin type. Such a 
network is descrfeed in a paper by AG. Fraser 
entitled, TOWARDS A UNIVB*SAL DATA 
TRANSPORT SYSTEM,* and printed in the IEEE 
Journal on Selected Areas in Communications, No- 
vember 1963. Round robin service involves keep- 
ing the arriving data on each virtual circuit In a 
separate per-drcuit buffer and transmitting a small 
amount of data in turn from each buffer that con- 
tains arty data, until all the buffers are empty, U.S. 
Pat No. 4,583,219 to Riddle descrfoes a prtcufcr 
round robin embodiment that gives low delay to 
messages consisting of a small amount of data 
Many other variations also fall within the spirit of 
round robin service. 

First-in-first-out queueing cfistipfines are some- 
what easier to implement than round robin <fis- 
cipGnes. However, under heavy-traffic conditions 
ftrsHrHfiret-out rfsctpfines can be unfair. This is 
explained in a paper by S.P. Morgan entitled. 
•QUEUEING DISCIPLINES AND PASSIVE CON- 



GESTION CONTROL IN BYTE-STREAM NET- 
WORKS," printed in the Proceedings of IEEE IN- 
FOCOM April 1989. When many users are con- 
tending for limited transmission resources, first-tn- 
5 first-out queueing gives essentially all of the band- 
width of congested links to users who submit long 
messages, to the exclusion of users who are at- 
tempting to transmit short messages. When there 
is not enough bandwidth to go around, round robin 
10 disciplines divide the available bandwidth equally 
among all users, so that light users are not locked 
out by heavy users. 

On any data connection it is necessary to keep 
the transmitter from overrunning the receiver. This 
is Is commonly done by means of a siding-window 
protocol, as described by A.S. Tanenbaurn in the 
book COMPUTER NETWORKS, 2nd ecL, published 
by Prentice Hall (1988)^-223-239. The transmitter 
sends data in units caOed frames, each of which 
20 carries a sequence number. When the receiver has 
received a frame, it returns the sequence number 
to the transmitter. The transmitter is permitted to 
have only a limited number of sequence numbers 
outstanding at once; that Is, it may transmit up to a 
25 specified amount of data and then it must wait until 
it receives the appropriate sequential acknowledg- 
ment before transmitting any new data If an ex- 
pected acknowledgment does not arrive within a 
specified time interval, the transmitter retransmits 
30 one or more frames. The maximum number of bits 
that the transmitter is allowed to have in transmit at 
any given time Is called the window size and will 
be denoted here by W. The maximum number of 
outstanding sequence numbers is also sometimes 
35 called the window size, but that usage wiO not be 
followed here. 

Suppose that the transmitter and receiver are 
connected by a circuit of speed S bits per second 
with a rouncMrip propagation time T Q seconds, and 
40 that they are able to generate or absorb data at a 
rate not less than S.Let W be the window size. 
Then, to maintain continuous transmission on an 
otherwise kfle path, W must be at least as large as 
the round-trip window Wo. where W 0 is given by 
45 W 0 =STo. Wo is sometimes called the delay-ban*- 
width product If the circuit passes through a num- 
ber of links whose speeds are different then S 
represents the speed of the slowest Bnk. If the 
window is less titan the round-trip window, then the 
so average fraction of the network bandwidth that the 
circuit gets cannot exceed W.W©. 

In principle, if a circuit has a window of a given 
size, buffer space adequate to store the entire 
window must be available at every queueing point 
to prevent packet loss in afl cases , since forward 
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progress can momentarily come to a halt at the 
beginning of any fink. This is explained in more 
detail below. On a lightly loaded network, signifi- 
cant delays are unlikely and there can generally be 
sharing of buffer space between circuits. However, 
the situation is different when the network is con- 
gested. Congestion means that too much traffic has 
entered the network, even though individual circuits 
may all be flow controlled. Uncontrolled congestion 
can lead to data loss due to buffer overflow, or to 
long delays that the sender interprets as losses. 
The losses trigger transmissions, which lead to an 
unstable situation in which network throughput de- 
clines as offered load increases. Congestion in- 
stability comes about because whenever data has 
to be retransmitted, the fraction of the network's 
capacity that was used to transmit the original data 
has been lost tn extreme cases, a congested net- 
work can deadlock and have to be restarted. 

Congestion control methods are surveyed by 
Tanenbaum, op. ctt, pp. 287-88 and 309-320. 
Many congestion control methods involve the sta- 
tistical sharing of buffer space in conjunction with 
trying to sense the onset of network congestion. 
When the onset of congestion is detected, attempts 
are made to request or require hosts to slow down 
their input of data into the network. These tech- 
niques are particularly the ones that are subject to 
congestion instability. Abusive hosts may continue 
to submit data and cause buffer overflow. Buffer 
overflow causes packet losses not only of a host 
submitting the packets that cause the overflow, but 
also of other hosts. Such packet loss then gives 
rise to retransmission requests from afl users losing 
packets and ft is this effect that pushes the network 
toward instability and deadlock. Alternatively, as 
mentioned above, it has been recognised for a long 
time that congestion InstabiRty due to data loss 
does not occur In a virtual-circuit network, provided 
that a fell window of memory is allocated to each 
virtual circuit at each queuing node, and provided 
that if a sender times out it does not retransmit 
automatically but first issues an inquiry message to 
determine the last frame correctly received. If fuD 
per-circuit buffer allocation is combined with an 
intrinsically fair queuelng dteclpfine, that is, some 
variant of round robin, the network is stable and as 
fair as it can be under the given load. 

The DATAKTT (Registered trademark) network 
is a virtual circuit network marketed by AT&T that 
operates at a relatively low transmission rate and 
provides full window buffering for every virtual cir- 
cuit as Just described. This network uses technol- 
ogy strnflar to that disclosed in U.S. Patent Re 
31319. which reissued on July 19, 1983 from A.a 
Feasor's U.S. Patent No. 3,749345 of July 31, 
1973, and operates over relatively low-speed Tl 
channels at approximately 15 megabits per sec- 



ond. The DATAKTT network is not subject to net- 
work instability because of full-window buffering for 
each virtual circuit and because data loss of one 
host does not cause data loss of other users, 
s Dedicated full-window buffering is reasonable for 
such low-speed channels; however, the size of a 
data window increases dramatically at speeds high- 
er than 1.5 megabits per second, such as might be 
used in fiber-optic transmission. If N denotes the 

10 maximum number of simultaneously active virtual 
circuits at a node, the total buffer space that is 
required to provide a round-trip window for each 
circuit is HST^ It may be practicable to supply this 
amount of memory at each node of a low-speed 

is network of limited geographical extent However, at 
higher speeds and network sizes, it ultimately 
ceases to be feasible to dedicate a full round-trip 
window of memory for every virtual circuit For 
example, assuming a nominal transcontinental 

20 packet round-trip propagation time of 60 ms, a 
buffer memory of 11 kilobytes is required lor every 
circuit at every switching node for a 1,5 megabits 
per second transmission rate. This increases to 33k 
kilobytes at a 45 megabits per second rate. 

25 A need exists for solutions to the problem of 
avoitfng congestion instability,. whBe at the same 
avokfing the burgeoning buffer memory require- 
ments of known techniques. It is therefore an over- 
all object of the present invention to retain the 

do advantages of full-window buffering while substan- 
tially reducing the total amount of memory re- 
quired. 

It is another object of the invention to reduce 
the amount of buffering required for each circuit by 

35 the sharing of buffer memory between circuits and 
by dynamic adjustment of window sizes for circuits. 

U.S. Pat No. 4,736,369 to Baralai et at ad- 
dresses some aspects of the problem off adjusting 
window sizes dynamically during the course of a 

40 user session, in response to changes in traffic 
patterns and buffer availability. However, this patent 
assumes a network in which flow control and win- 
dow adjustment are done on a Bnk-by-Onk basts, 
that Is, as a result of separate negotiations between 

45 every pair off adjacent nodes on the path between 
transm i tter and receiver. For high-speed networks, 
Bnk-by-fink flow control Is generally considered to 
be less suitable than end-to-end control, because 
off the addttonaJ computing load that fink-toy-fink 

so control puts on the network nodes. 

Thus, it Is an another object off the invention to 
perform flow control on an end-to-end basis with 
dynamically adjustable windows. 



The invention Is a method of controlling con- 
gestion in a virtual circuit data network. A data 
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* buffer is assigned to each virtual circuit at each 
node Into which incoming data is stored and later 
removed for forward routing, the size of a buffer for 
each virtual circuit at a switching node is dynam- 
ically allocated In response to signals requesting 5 
increased and decreased data window sizes, re- 
spectively. If a larger buffer is desired for a virtual 
circuit to service a larger amount of data, then 
additional buffer space is dynamically allocated se- 
lectively to the virtual circuit on demand if each 10 
node has sufficient unallocated buffer space to fill 
the request Conversely, the allocated buffer space 
for a circuit is dynamically reduced when the data 
source no longer requires a larger buffer size. In 
one embodiment, the additional space is allocated is 
to a virtual circuit in one or more blocks of fixed 
size, up to a maximum of a full data window, 
wherein a full data window is defined as the virtual 
circuit transmission rate multiplied by a representa- 
tion of the network round trip propagation delay. In 20 
a second embotfment the additional allocation is 
done In blocks of variable size. 

The size of a block to be allocated at each 
node of a virtual circuit is determined based on the 
amount of data waiting to be sent at the packet as 
source, and on the amount of unallocated buffer 
space at each said node. It may also be based on 
the amount of data already buffered at each said 
node. 

To perform the additional allocation at each so 
node of a virtual circuit in a representative embodi- 
ment of the Invention a first control message is 
transmitted along a virtual circuit from the first 
node in the circuit to the last node in the circuit 
Each node writes Information into the first control 35 
message as it passes through describing the 
amount of unallocated buffer space at the node and 
the amount of data already buffered at the node. 
The last node in the virtual circuit returns the first 
control message to the first node where the size of 40 
an allocated block is determined based on the 
information to the returned first control message. A 
second control message is then transmitted from 
the first node to the last node in the virtual circuit 
specifying the additional space. 45 

Brief Description of the Drawing 

In the drawing, 

Rg. 1 discloses the architecture of a typical so 
data switching network having a plurality of switch- 
ing nodes connected to user packet host sources 
and destinations; 

Rg. 2 discloses Illustrative details of a data 
receiving and queueing arrangement at a node for 66 
an incoming channel having a plurafity of mul- 
tiplexed time slots corresponding to individual vir- 
tual circuits; 



Rg. 3 discloses illustrative details of a control- 
ler of Rg. 2 that administers the buffer space 
allocation and data queueing of virtual circuits on 
an incoming channel; 

Rg. 4 discloses illustrative details of a router 
that converts between variable-length data packets 
from a host and constant-length data cells and 
further administers the buffer space allocation and 
data queueing at the router; and 

Rg. 5 shows an illustrative method of determin- 
ing buffer lengths of data for a virtual circuit at a 
router or switching node; 

Rgs. 6 and 7 show illustrative flowcharts de- 
picting the protocols and method steps performed 
at the routers and nodes of dynamically allocating 
buffer space for a virtual circuit at routers and 
nodes for an embodiment in which buffer lengths at 
input routers are used as decision criteria for dy- 
namic buffer allocation; and 

Rgs. 8 through 12 discJose flowcharts depict- 
ing the protocols and method steps performed at 
the routers and nodes for allocating buffer space in 
blocks of fixed or varying sizes to virtual circuits In 
an embodiment in which buffer lengths at nodes 
are used in conjunction with buffer lengths at rout* 
ers as decision criteria. 

Detailed Description 

Rg. 1 shows a block diagram of an illustrative 
packet-switching network. It is assumed that the 
network Interconnects many packet sources and 
destinations by means of virtual circuits among a 
number of routers and switching nodes. Packet 
sources and destinations are attached to local area 
networks that are on user sites. For examples 
source 102 is connected to a focal network 106, 
which is connected to a router 100. One of the 
functions of the router is to convert between the 
variable-length data packets issued by the source 
and the 'constant-length data cells transmitted and 
switched by the ceO network 100. While ceils are 
considered to be of fixed length, this is not a 
limitation of the invention. Other functions of the 
router relevant to the invention wfl) be described 
below. 

The router attaches the local network 106 to 
the ceil network 100 via the access fne 108. Data 
ceils belonging to a particular virtual circuit are 
transmitted through a sequence of switching nodes 
114 and data links 116 to an access tine 118 that is 
connected to a router 120. The router 120 reas- 
sembles the data cefls into data packets addressed 
to a particular destination, and transmits the pack- 
ets to the local network 124, from whence they are 
taken by the destination 12a 

It is assumed for purposes of disclosure that 
the network 100 is similar to the DATAKJT (R) 
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virtual circuit network marketed by AT&T, except 
that the network 100 operates at a considerably 
higher transmission rate. That Is, ft is assumed that 
network 100 establishes a virtual circuit path be- 
tween a source router and a destination router via 
selected ones of the switching nodes 114 when a 
connection is first initiated. Packets passing from a 
source to a destination are routed via the virtual 
circuit for the duration of the connection, although 
the actual transmission fines and bandwidth on the 
transmission lines in the path are not dedicated to 
the connection in question, but might be time- 
shared among many such connections. 

in accordance with the invention, Rg. 2 shows 
an illustrative embodiment of a cell buffering ar- 
rangement at a node. This buffering arrangement is 
able to handle many virtual circuits. Buffer space is 
allocated per^virtuaKctrcuit and the allocation for a 
virtual circuit can be changed dynamically, under 
control of the monitor 200. The monitor is a con- 
ventional microprocessor system that is used to 
implement congestion control mechanisms to be 
descrfced later. The receiver 202 and transmitter 
204 In the figure are conventional, and the transmit- 
ter may implement round robin service among the 
virtual circuits using established techniques. 

When a ceil arrives, the receiver 202 deter- 
mines whether the ceil is a congestion message as 
indicated by a bit in the header. Congestion mes- 
sages are stored in a separate FIFO queue 206 for 
the monitor. If an arriving cefl is not a congestion 
message, the receiver 202 produces a virtual cir- 
cuit number on bus WVC and a write request on 
lead WREQ. The receiver places the ceO on its 
output bus 208 where It is buffered in the ceil 
queue 210 under the control of the controller 212. 
The cell queue 210 is a memory array of some 
suitable size, which for the purposes of exposition 
is organised In words which are one cell wide. 

The receiver 202 and the transmitter 204 are 
autonomous circuits. Each operates independently 
of the other to enter cells to and remove ceils from 
the cell queue 210, respectively. When the trans- 
mitter 204 is ready to send a ceil, it produces a 
virtual circuit number on bus RVC and a read 
request on lead RREQ. If the allocated buffer in 
queue 210 associated with virtual circuit RVC is 
empty, the controller 212 will indicate this condition 
by setting signal EMPTY to a value of TRUE and 
the transmitter can try another virtual droit Other- 
wise, the next cell in the buffer associated with 
RVC wiQ appear on the output bus to be read by 
me transmitter 204. The controller 212 controls the 
ceil queue via signals on bus MAOOR and leads 
MW and MR. MADDR is the address In the ceil 
queue 210 at which the next cefl is to be written or 
read. MW and MR signify a queue write or read 
operation, respectively. Congestion messa ge s gen- 



erated by the monitor 200 are stored in a separate 
outgoing FIFO 214. These messages are multi- 
plexed with outgoing cells onto the transmission 
Dne 216 by the transmitter. 

s To implement congestion control schemes, the 
monitor 200 has access to data structures internal 
to the controller 212 over the buses ADDR, R, W, 
and DATA. These data structures include the in- 
stantaneous buffer length for each virtual circuit 

w and the overall number of cells in the cell queue. 
Averaging operations required to implement con- 
gestion control, according to the protocols de- 
scribed below, are performed by the monitor 200. 
Fig. 3 shows illustrative details of the controller 

is 212 of Fig. 2. The major functions of the controller 
are to keep track of the buffer allocation for each 
virtual circuit to keep track of the Instantaneous 
buffer use (buffer length) for each virtual circuit to 
manage the allocation of memory in the cefl queue 

20 such that data can be buffered for each virtual 
circuit in a dedteated buffer of dynamically varying 
length, and to control the writing and reading of 
data in the ceil queue as it is received and trans- 
mitted. For the purposes of exposition, memory is 

25 partitioned In the queue in units of one ceil. This 
section first describes the basic elements of the 
controller, and then describes the operations of 
these elements in detail. 

An arbiter 300 receives signals WREQ and 

ao RREQ, which are requests to write a cefl to a buffer 
associated with a particular virtual circuit or to read 
a cell from the buffer associated with a particular 
virtual circuit respectively. The arbiter insures that 
read and write operations occur in a non-interfering 

as manner, and that the select input to the multiplexer 
(W.ORR) is set such that input RVC is present on 
bus VC during read operations and input WVC is 
present on bus VC during write operations. The 
remainder of this tfscusskxi will consider read and 

40 write operations separately. 

A table COUNT.TABLE 304 is provided for 
storing the buffer allocation and buffer use for each 
virtual circuit The table is addressed with a virtual 
circuit number on bus VC from the multiplexer 302. 

46 Each virtual circuit has two entries in C 
OUNT.TABLE One entry. UMITTVC1 contains the 
maximum number of cells of data that virtual circuit 
VC is presently allowed to buffer. This, in turn, 
determines the window size allocated to the virtual 

so circuit The second entry, COUNTIVC1 contains 
the number of cefls that are presently used in the 
cefl queue 210 by virtual circuit VC. The contents 
of COUNT.TABLE can be read or written by the 
monitor 200 at any time before or during the opera- 

55 tion of the controller 212. 

A table QUEUE_POiNTB*S 300 contains the 
read and write pointers for the buffer associated 
with each virtual circuit Read pointer RP(VC] re- 
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ferences the location containing the next ced to be 
read from the buffer associated with virtual circuit 
VC; write pointer WPfVC] references the next loca- 
tion to be written in the buffer associated with 
virtual circuit VC. 

Buffers of dynamically varying length are main- 
tained by keeping a linked list of ceils for each 
virtual circuit The linked Ests are maintained by the 
UST-MANAGER 308, which aJso maintains a linked 
list of unused ceils that make up the free buffer 
space. Operation of the LIST MANAGER is de- 
scribed below. 

A GLOBAL^ COUNT register 310 keeps track 
of the total number of cells in all virtual circuit 
buffers, if each virtual droit is initialized with one 
(unused) cefl in its buffer, the initial value of the 
GLOBALCOUNT register is equal to the number of 
virtual circuits. The GLOBALCOUNT register can 
be written or read by the monitor. The 
TIMING + CONTROL circuit 312 supples afl of the 
control signals needed to operate the controller. 

Prior to the start of read request or write re- 
quest operations, the controller is initiaized by the 
monitor. For each virtual circuit WP(VC] and RP- 
[VCJ are initialized with a unique ceil number and 
COUNTIVC] is inftiaflzed with a value of 1, repre- 
senting an empty buffer with one (unused) ceil 
present for receipt of incoming data. The initial 
value of UM171VC] is the initial buffer allocation for 
that virtual circuit, which is equivalent to its initial 
window size. .The UST.MANAGER is initialized 
such that the free Bst Is a linked fist containing all 
ceils in the ceil queue 210 except those which are 
initialized in table QUEUE_POINTB*S. 

When a ceil arrives, the receiver asserts a write 
request on WREQ and the virtual circuit number on 
WVC. Bus VC is used to address COUNT.TABLE 
causing the values in the COUNTIVC] and UMIT- 
[VC} fields to be sent to a comparator 314. If the 
virtual circuit in question has not consumed ail of 
its allocated space in the cell queue, i.e. if COUNT- 
[VC] is less than UMflTVC] In the table, the com- 
parator will generate a FALSE value on lead 
UMITREACHED. Bus VC is also used to address 
the QUEUE-POINTERS table such that WPfVC] is 
present on bus MAODFL When UMfTREACHED is 
FALSE, the timing and control circuit will generate 
signal MW which causes the cefl to be written to 
the cell queue 210, and will control the L 
IST.MANAGER to cause a new cefl to be allocated 
and linked into the buffer as s oci ate d with VC. tn 
addition, the buffer use for VC and the overall cefl 
count values wiU be updated. To update the buffer 
use, the present value in COUNTIVC] will be rout- 
ed via bus COUNT to an up/down counter, which 
increments the present number of cells recorded in 
COUNTIVC] by one. This new value, appearing on 
bus NCOUNT, is present at the input of 



COUNTJTABLE, and will be written into the table. 
The overall call count is incremented in a similar 
manner using register GLOBALCOUNT 310 and an 
up/down counter 316. 

5 if, during a write operation, UMITREACHED is 
TRUE, which means that the virtual circuit in ques- 
tion has consumed all of its allocated space in the 
cefl queue, the T+C circuit 312 will not generate 
signals to write data into the cell queue, to allocate 

to a new cell, or to increment the value of COUNT- 
[VC] or GLOBAL__COUNT. Acconfingfy, any VC 
exceeding its assigned window size loses the cor- 
responding ceils, but the data for other virtual dr- 

**-~ !m jw-u* jx ff j-LJ-ii .-i_ri 

cuits is not anecteo. 

75 When the transmitter is ready to send a new 
cefl, it asserts a read request on lead RREQ and 
the virtual circuit number on bus RVC. C 
OUNTTABLE is accessed causing the value of 
COUNTIVC] to be sent to a comparator 318; whose ' 

20 second input is the value zero. If the buffer asso- 
ciated with VC contains no data, the comparator 
318 wiD generate a TRUE signal on EMPTY, and 
the operation will be terminated by the 
TIMING + CONTROL circuit 312. ff EMPTY is 

25 FALSE, the up/down counter 320 wiR decrement 
the value of COUNTIVC], and the resulting value 
will be written into COUNT.TABLE 304. In this 
case, the value of RP[VC] from QUEUEPOINTERS 
is present on bus MADOR and the MR signal Is 

30 generated, reacting a cefl from the cefl queue 210. 
RPfVC] is also input to the UST_MANAGER 308 
so that the cell can be deallocated and returned to 
the free store. The address of the next ceO in the 
buffer for VC is present on bus NRP and is written 

as into QUEUE_POINTERS 306. The overafl count of 
cells buffered, which is stored in 
GLOBAL_COUNT 310, is decremented. 

The UST_MANAGER 308 maintains a Inked 
Dst of memory locations which make up cefl buffers 

40 for each virtual circuit It also maintains a linked Bst 
of memory locations which make up the free QsL 
The USTMANAGER 308 contains a fink memory 
LNKMEM 322, which contains one word of informa- 
tion for every cefl in the cefl queue 210. The width 

45 of a word In LNKMEM 322 is the logarithm to base 
2 of the number of cells in the cefl queue 210. 
There is a register, FREE 324, which contains a 
pointer to the first entry in the free fist 

Consider the buffer for virtual circuit VC. The 

so read pointer RPJVC] points to a location In the ceil 
buffer at which the next cefl for virtual circuit VC is 
to be read by the transmitter. RPfVC] points to a 
location in LNKMEM 322 which contains a pointer 
to the next cefl to be read from the ceil queue 210 

55 and so on. Proceeding in this manner, one arrives 
at a location in LNKMEM 322 which points to the 
same location pointed to by WP[VCJ. In the cefl 
queue 210 this location is an unused location which 
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is available for the next cell to arrive for VC. 

Free space in the ceil queue 210 is tracked in 
LNKMEM 322 by means of a free list The begin- 
ning of the free list is maintained in a register 
FREE 324 which points to a location in the cell 
queue 210 which is not on the buffer for any virtual 
circuit FREE points in LNKMEM 322 to a location 
which contains a pointer to the next free cell, and 
soon. 

When a write request occurs for a virtual circuit 
VC, if VC has not exceeded its buffer allocation, a 
new ceil will be allocated and finked into the buffer 
associated with VC. The value in WP[VC] is input 
to the UST.MANAGER 308 on bus WP at the 
beginning of the operation. A new value NWP of 
the write pointer is output by the UST_MANAGER 
308 at the end of the operation. NWP will be 
written into table QUEUE_POINTERS 306. This 
occurs as follows: 

1) The value in register FREE 324, which repre- 
sents an unused ceil, will be chained into the 
finked Gst associated with VC, and will also be 
output as NWP. 

NWP = LNKMEMfWP] ■ FREE 

2) The next free location in the free list will be 
written into FREE 324. 

FREE = LNKMEM [FREE] 
When a read request occurs for a virtual circuit 
VC, the ceil which is currently being read, namely 

RP[VCi will be input to the LIST MANAGER 308 

on bus RP to be returned to the free list and the 
next cell in the buffer associated with VC will be 
returned as NRP. NRP will be written into table 
QUEUE_POINTe*S 306. This occurs as follows: 

1) A new read pointer is returned which points 
to the next ceil in the buffer associated wfth VC. 

NRP = LNKMEM[RP] 

2) The cell which was read in this cycle is 
deallocated by inking it into the free fist 

LNKMEM[RP] = FREE 
FREE=RP 

Fig. 4 is an illustrative embodiment of a router, 
such as 110 of Fig. 1. Variable length packets 
arriving from the local area network 106 of Fig. 1 
are received by the LAN receiver 400 at the upper 
left of Fig. 4. A global address, present In each 
packet, is translated to a virtual circuit number by 
the translation circuit 402. Since the packet will be 
transported using fixed length cells that may be 
smaller or larger than the length of the particular 

trailer bytes may need to be added to the packet to 
facilitate reassembly of the packet from a se- 
quence of ceils which arrive at the destination 
router, to aOow a destination router to exert flow 
control over a source router, or to allow dropped or 
misdfrected cells to be detected. The resulting 
Information must be padded to a length which is an 



integral multiple of the cell size. These functions 
are not pertinent to the invention; however, an 
illustrative embodiment is described to Indicate the 
relationship of these functions to the congestion 
s management functions that must be performed by 
the router. 

The LAN packet and the virtual circuit number 
produced by the translation circuit 402 are passed 
to segmentation circuit 404, which may add header 
to or trailer bytes to the packet, either for the func- 
tions described above or as placeholders for such 
bytes to be supplied by a second segmentation 
circuit 40a The resulting information is padded to 
an integral multiple of the cell size and is stored in 

is a cell queue 406, which may be identical in struc- 
ture to the cell queue 210 described in Fig. 2. In 
particular, Internal data structures in a controller 
410 may be accessed by monitor 412 that allow 
the buffer use (buffer length) to be monitored for 

20 each virtual circuit, and that allow the buffer alloca- 
tion per virtual circuit to be adjusted dynamically. 
Segmentation circuit 408 performs window flow 
control on each virtual circuit, where the window 
size for each virtual circuit may be varied dynam- 

26 icalty under the control of the protocols described 
below. To perform window flow control, segmenta- 
tion circuit 408 may fill in the added data bytes as 
appropriate to complete the reassembly and flow 
control protocol. As a minimum, segmentation clr- 

X oust 408 maintains a counter per virtual circuit 
which keeps track of the amount of outstanding, 
unacknowledged data that it has sent In order to 
implement window flow control, and It receives 
acknowledgments from the remote receiver incBcat- 

35 ing data that has passed safely out of the flow 
control window. Techniques for implementing reas- 
sembly and window flow control are well known in 
the art; the unique aspect of the frrvention is that 
the window sizes and buffer sizes may change 

40 dynamically under the Influence of congestion con- 
trol messages. The transmitter 415 takes cells from 
segmentation circuit 408, from the local receiver as 
described below, and from the outgoing congestion 
FIFO 419 and sends them out on the outgoing ceil 

45 transmission ine 416. 

Router 110 also receives ceils from network 
100 via the access line 112 of Fig. 1. These cells 
arrive at the receiver 414 at the lower right comer 
of Fig. 4. Insofar as these ceRs result from packets 

so originated by the source 102 and intended for the 
destination 128, they will be either congestion mes- 
sages or acknowledgments from the remote router 
120. The handling of cells that may arrive on 
access fine 112 from other sources, which are 

66 attempting to communicate with destinations at- 
tached to local network 106, will be deferred until 
the discussion of router 120 below. 

When a ceO of one of the two types under 
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consideration arrives, the receiver 414 determines 
whether the cell Is congestion message as in- 
dicated by a bit in the header. Congestion mes- 
sages are stored in a separate FIFO queue 417 for 
the monitor 412 and handled according to one of 
the protocols described below, if the protocol gen- 
erates a further congestion message, an appro- 
priate ceil is sent from the monitor 412 to seg- 
mentation circuit 408 and multiplexed onto the out- 
going ceil transmission fine 416. If an arriving ceil 
is not a congestion message, the receiver 414 
sends the cell to reassembly circuit 418, which 
determines whether a cell is an acknowledgment 
from the remote router. If this is the case, reassem- 
bly circuit 418 sends an acknowledgment-received 
notification to segmentation circuit 408, so that it 
can update the count of the amount of outstanding - 
data. 

A router Identical in structure with Fig. 4 may 
also represent element 120 of Fig. 1. In such case, 
the receiver 414 corresponding to such router 
takes cefls from the outgoing access ine 118 of 
Fig. 1. Insofar as these cells result from packets 
originated by the source 102 and intended for the 
destination 128, they will be either data cefls or 
congestion messages from the remote router 110. 
When a cell arrives, the receiver 414 deter mines 
whether the cell is a congestion message as in- 
dicated by a bit in the header. Congestion mes- 
sages are stored in a separate FIFO queue 417 for 
the monitor 412 and handled according to one of 
the protocols described below. If the protocol gen- 
erates a further congestion message, and appro- 
priate cell Is sent from the monitor 412 to seg- 
mentation circuit 408 and multiplexed onto the out- 
going cell transmission ine 416 at the lower left of 
Fig. 4. If an arriving cell is not a congestion mes- 
sage, the receiver 414 sends the cell to reassem- 
bly circuit 418^ which buffers the arriving cell in a 
per-virtual circuit buffer in cefl queue 420. If the 
reassembly circuit 418 detects that a complete 
local area network packet has been accumulated, 
reassembly circuit 418 sends a send-acknowledg- 
ment command to the local transmitter 41 6 on lead 
422, which causes an acknowledgment message to 
be sent to the remote router 110. In addition, 
reassembly circuit 418 issues multiple-read re- 
quests to the buffer controller 422 causing the cefls 
which make up the packet to be sent In succession 
to reassembly circuit 424. To facilitate the reas- 
sembly procedure, reassembly circuit 424 may de- 
lete any header or trailer bytes which were added 
when the packet was converted to cede by router 
110. The packet Is then sent to the translation 
circuit 426, where the global address is translated 
into a local area network specific address before 
the packet is sent onto the local are network 124. 



Choice of window sizes 

The operation of the apparatus and protocols 
described in this invention does not depend on the 
5 choice of window sizes. Various practical consider- 
ations may determine the window sizes that are 
used. If there are only two window sizes, the follow: 
ing considerations lead to preferred relationships 
among the numbers of virtual circuits and the win- 
10 dow sizes. 

Suppose that the maximum number of virtual 
circuits that can be simultaneously active at a given 
node is No- Suppose further that It is decided to 
provide some number Nt less than No of the virtual 
75 circuits with full-size windows Wo, while providing 
the remaining No-Nj virtual circuits with buffers of 
some smaller size Bo that Is adequate for Bght 
traffic. If there are H% simultaneous users each of 
whom gets an equal fraction of the channel, the 
20 fraction of the channel that each gets is 1/Nf . The 
maximum fraction of the channel capacity that can 
be obtained by a user having a window size Bo Is 
Bq/Wq. Setting 1/Nj equal to the maximum fraction 
of the trunk that can be had by a user with a small 
25 buffer, namely BoA/Vb, gives the following relation- 
ship among the quantities: W 0 /B 0 =N 1 . The total 
buffer space B allocated to all the virtual circuits Is 
B = (N0-N1 )Bo + N 1 Wo = N0B0-W0 ♦ W 0 2 /Bo. 
Minimizing B with respect to leads to 
30 Bo = Wo/{No) ,tt . 
N, = (No) 1 *. 
B ■ PfNo^-IJWo. 

These equations provide profaned relationship 
among Bio, Ni, No, and W*. 

36 If there are more than two window sizes, var- 
ious choices are possible. It may be convenient to 
choose the sizes in geometric progression, for ex- 
ample, increasing by powers of 2. An alternative 
approach that may be preferred in some instances 

40 is to have different sizes correspond to rotflncHrip 
windows at various standard transmission speeds. 
Still other choices may be dictated by other cir- 
cumstances. 

45 miner Auocaoon rroiocots 

The foflowing discusses protocols by means of 
which sharable buffer space Is allocated and deal- 
located and by means of which virtual-circuit 
50 nodes, routers, and hosts are so alerted. The read- 
er Is directed to Rgs. 5 through 12 as required. 

Each node controller 212 keeps track of the 
buffer length of each of fts virtual circuits via the 
entry COUNT[VC] in the table COUNT_JABL£ 
66 mat nas been described m connection wttn rig* 3. 
Each node controller also keeps track of the size of 
free 1st, wwcn is tne csnerence between tne 
(fixed) number of cells In the cell queue 210 of fig. 



8 



15 EP 0 430 570 A2 16 



2 and the contents of the register 
GLOBAL^ COUNT 310 described in connection 
with Fig. 2. All of these quantities are available to 
be read at any time by the node monitor 200 
shown in Fig. 2. In a similar way, each router keeps s 
track of the input buffer length of each of its virtual 
circuits, In a table that is available to the router 
monitor 412 shown in Fig. 4. For purposes of 
disclosure, it will be assumed that each router 
manages Its cell queue 406, shown on the left side to 
of Fig. 4, in a manner similar to the switching 
nodes, so that quantities analogous to COUNT and 
GLOBAL_COUNT 310 are available to the router's 
monitor. 

It is unnecessary, but desirable, for the node is 
controllers and the routers to maintain smoothed 
averages of buffer lengths. A popular smoothing 
procedure for the time-varying quantity q is given 
by the easfly implementabie recursive 
algorithm, 20 
r o =(1-0Qn 

where q. represents the value of q at epoch n. r^ 
represents the moving average at epoch n-1,r. 
represents the moving average at epoch n. and f is 
a number between 0 and 1 that may be chosen to 26 
control the length of the averaging interval. If ob- 
servations are made at intervals of At seconds, the 
approximate averaging interval is T A y seconds, 
where 

T A v = (1-1/togf)At so 
Appropriate averaging intervals for network conges- 
tion control may be between 10 and 100 round-trip 
times. 

In various embodiments of the present invert- 3$ 
tion, up to four binary quantities are used with each 
virtual circuit as indicators of network congestion. 
These quantities are defined as follows. 

BIG INPUT. A repetitive program at a router 

is executed periodically (Fig. 5, step 500) to update 40 
this parameter. It is set equal to 1 (step 508) for a 
virtual circuit if a buffer in a ceB queue such as 406 
for that virtual circuit at the router 110 has been 
occupied during more than a certain fraction of the 
time in the recent past, and it is set equal to 0 45 
(step 510) if the buffer has not been occupied 
during more than that fraction of time. For the 
determination of BK3_INPUT, the quantity q in the 
moving-average algorithm (step 504) may be taken 
as 1 or 0 depending on whether or not any data is 50 
found In the buffer at the given observation. The 
quantity r (step 506) is then an estimate of the 
fraction of time that the buffer has been occupied 
during the past T AV seconds. A representative but 
by no means exclusive threshold for r would be 66 
0J5. 

SO M E_BAC KLOGL This quantity is set equal 
to 1 for a given virtual circuit at a given node 114 



or output router 120 If the virtual-circuit buffer at 
that node or router has been occupied during more 
than a certain fraction of the time in the recent 
past, and ft Is set equal to 0 otherwise. For the 
determination of SOME__BACKLOG, the quantity q 
in the moving-average algorithm may be taken as 1 
or 0 depending on whether or not any data is found 
in the virtual-circuit buffer at the given observation. 
The quantity r is then an estimate of the fraction of 
time that the buffer has been occupied during the 
past T AV seconds. The flow of control for the moni- 
tor program that calculates SOME BACKLOG b 

entirely similar to Fig. 5. A representative but by no 
means exclusive threshold for r would be 0.5. The 
thresholds for BIG_JNPUT aid for SOME-BACK- 
LOG need not be the same. 

BIG BACKLOG. This quantity is set equal to 1 

for a given virtual circuit at a given node or output 
router if the virtual circuit has a large buffer length 
at the node or router, and is set equal to 0 other- 
wise. Since the lengths of buffers at bottleneck 
nodes vary slowly, smoothing of the buffer length 
is probably unnecessary. The criterion for a large 
buffer length may depend on the set of window 
sizes. If the window sizes are related by factors of 
2, a representative although not exclusive choice 

would be to set BIG BACKLOG equal to 1 if the 

instantaneous buffer length exceeds 75% of the 
current window, and equal to 0 otherwise. If the 
window sizes are equafly spaced, a representative 

choice would be to set BIG BACKLOG equal to 1 

if the instantaneous buffer length exceeds 150% of 
the spacing between windows, and equal to 0 oth- 
erwise. 

SPACE_CRUNCH. This quantity is set equal 
to 1 at a given node or output router if the Instanta- 
neous number of occupied ceils, namely 
GLOBAL_COUNT 310, at that node or router is 
greater than some fraction F of the total number of 
cells In the ceti queue 210 or 406 of Fig. 2 or Fig. 
4, respectively, and it is set equal to 0 otherwise. A 
representative choice would be F=7/& although 
the value of F does not appear to be critical. 

Various window management protocols may be 
embodied using some or aO off the congestion 
indicators defined above. Without Smiting the 
scope of the invention, two embodiments are de- 
scribed below. In each of the two e mbodi m ent s , 
each virtual circuit always has a buffer allocation at 
least as large as the minimum size Bo and it may 
have other sizes variable up to the limit of a full 
size window The first embodiment makes use 
only of the length of a buffer at a data source (a 
router) and the avaOabiity of free queue space at 
the nodes to manage changes in window size. The 
second embodiment makes coordinated use of 
conditions relating to buffer lengths and free queue 
space at the data source and at all the nodes of the 
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virtual circuit 

In each of the two embodiments, it is assumed 
that both directions of the virtual circuit traverse 
exactly the same nodes, and that each node has a 
single monitor 200 that can read and respond to 
messages carried by congestion control ceOs trav- 
eling in either direction. If the forward and return 
paths are logically disjoint obvious modifications of 
the protocols can be used, instead of carrying out 
some functions on the return trip of a control mes- 
sage, one can make another traverse of the virtual 
circuit so that ail changes are effected by control 
messages traveling in the forward direction. 

fart the first embodiment, the flow of control in 
the program that runs in the monitor of the input 
router 110 is shown schematically in Fig. 6. In Rg. 
6, the quantity LIMIT refers to the existing buffer 
allocation for a particular virtual circuit The quan- 
tity W!NDOW__SiZE refers to a proposed new buff- 
er allocation. The input router 110 monitors the 
quantity BK3_JNPUT for each of its virtual circuits 
(step 605 of Rg. 6). Rom time to time, as will be 
described below, it may request a change in the 
size of the window assigned to a given virtual 
circuit it makes such a request by transmitting -a 
control message over the virtual circuit (steps 608 
and 614). In the embodiment described here, the 
message is carried by a special congestion control 
ceil that Is identified by a bit in its header. Alter- 
natively, the congestion control message may be 
carried by special bits in a congestion field in the 
header of an ordinary data cell, if such a field has 
been provided. There Is no logical difference be- 
tween the use of special control cells and the use 
of header fields. 

An input router that wishes to change the size 
of its window transmits a message containing the 
quantities 0, W1ND0VV__SIZE. The initial 0 repre- 
sents a variable called ORIGIN. Messages that 
carry requests from input routers are distinguished 
by the value ORIGIN »0; messages that carry re- 
sponses from output routers have ORIGIN — 1, as 
will appear below. W1ND0W_SIZE is the size erf 
the requested window, coded into as many bits as 
are necessary to represent the total number of 
available window sizes. By way of example, if there 
are only two possible sizes, WINDOW__SIZE re- 
quires only a single 0 or 1 bit 

An input router that requests a new window 
size larger than its present window size (steps 612, 
614) does not begin to use the new window size 
until it has received confirmation at step 616 (as 
described below. On the other hand, a router does 
not request a window size smaller than its currant 
allocation until it has already begun to use the 
smaller window (step 606). Since switch nodes can 
always reduce buffer allocations that are above the 
tnruai wtnoow size, confirmation Of a request tor a 



smaller window is assured. 

When the node controller 212 of a switching 
node along the forward path receives a control 
message containing 0,WINDOW_JSIZE, it pro- 
s cesses the message as follows. If the node control- 
ler can make the requested buffer allocation it does 
so, and passes the message to the next node 
without change. If there is insufficient unallocated 
space in the free 1st to meet the request, the node 

10 allocates as large a buffer size as it can, the 
minimum being the current buffer size, in either 
case, the controller writes the value of 
WlNDOW_S!ZE that it can allow into the message 
before passing it along to. the next node. The 

15 output router also meets the requested value of 
WlNDOW_ SIZE as nearly as it can, sets ORIGIN 
= 1 to incflcate a respo ns e message, and transmits 
the response containing the final value of 
WlNDOW_SIZE to the first switching node on the 

20 return path. Node controllers on the return path 
read ORIGIN = 1 and the WIND0W__8IZE field 
and adjust their allocations accordingly. The adjust* 
ments involve, at most, downward allocations for 
nodes that met the original request before some 

25 node failed to do so. When the input router re- 
ceives a control message containing 
1,WIND0W_SIZE, it knows that a set of buffer 
allocations consistent with the value 
WIND0VVJ5IZE exist along the whole path. 

so A newly opened virtual circuit has a buffer 
allocation Bo at each node and has a window of 
size Bq. The Input router should request an in- 
crease in window size as soon as it observes that 
BK3_INPUT ■ 1. After requesting a window 

as change and receiving a response, the input router 
may wait for some period of time D, such as 10 to 
100 round-trip times, before inspecting 
BIG_JNPUT again. Then if BK3JNPUT - 1, it 
may ask for another increase in window size, or if 

40 BIG_JNPUT = 0. ft may ask for a decrease. If a 
decrease is caOed for, the input router does not 
issue the request until the amount of outstanding 
data on the virtual circuit will fit Into the smaller 
window, and from that time on it observes the new 

46 window restricti o n. The actual allocation is not 
changed until the value of LIMIT is set equal to the 
value of WIND0W_SIZE (steps 60S. 618). 

The flow of control in the program that runs in 
the monitor of a switching node 114, to response to 

so the arrival of a congestion control cefl from either 
direction, is depicted in Rg. 7. Step 700 changes 
LIMIT to match the requested window size as 
closer/ as possible. Step 702 writes toe new value 
of LIMIT into the control cefl and passes the mes- 

56 sage along to the next node In the virtual circuit, 

The previous embod im en t has made use only 
of congestion information at the input router. A 
second embodiment employs a protocol that oo- 
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ordinate congestion information across the entire 
circuit in order to pick a new window size if one is 
needed, it uses a two-phase signaling procedure, in 
which the first phase sets up the new window and 
the second phase resolves any discrepancies that 5 
may exist among the new window and the buffer 
allocations at the various nodes. The logical steps 
carried out by the input and output routers and by 
the switching nodes are illustrated schematically in 
Rga 8 through 12. 10 

The protocol for the second embodiment uses 
the quantities ORIGIN. BK3_JNPUT, 
SOME_BACKLOG. BIG_BACKLOG, and 
SPAC6_CRUNCH that were defined earlier. Since 
the protocol uses two phases of signaling, it re- is 
quires one extra binary quantity. PHASE, which 
takes the value 0 for Phase 1 and 1 for Phase 2. In 
Phase 1, the input router 110 initiates a control 
mess^e carrying a 6-bit field that consists of the 
quantities ORIGIN » 0. PHASE =0. BK3_JNPUT. 20 
SPACE_CRUNCH =0, SOME_BACKLOG = 0, 
BH3_BACKLOG=0. The flow of control for the 
Input router is depicted in Rg. a 

The flow of control for a node controller Is 
shown in Rg. 9. When a node controller receives a 25 
Phase 1 control message, it inspects the values of 
SPACE_CRUNCH (step 900). SOME_BACKLOG 
(step 904), and BJG_BACKLOG (step 910), and If 
its own value of the given quantity is 0, it passes 
that tieU unchanged. If its own value of the quantity so 
is 1, it writes 1 into the corresponding field, as 
shown in Rg. 9 (steps 902. 906, 910). before 
transmitting the control message to the next 
switching node (step 912). 

When the receiving router 120 receives a ss 
Phase 1 control message, H first combines Hs own 
values of SPACE_CRUNCH, SOME__BACKLOG, 
and BIG__BACKLOG with the values in the arriving 
message, Just as the switching nodes have done. 
The receiving router then inspects the last four bits *> 
of the modified message and calculates a pro- 
posed value of WINDOW_SIZE accorcfing to the 
four cases below, using the logic flow shown in Rg. 
10. 

1) if BIG_JNPUT=1 and 45 
SOME_BACKLOG=0 (step 1000), then increase 

the window size. 

The virtual circuit is nowhere botttenecked by 
the round robin scheduler and the virtual circuit 
would fke to send at a faster rate; it is being so 
unnecessarily throttled by its window. 

2) If BK3_BACKLOG»1 and 
SPACE_CRUNCH«1 (steps 1002,1004), then re- 
duce the window size. 

Some node is bottienecked by the round robin ss 
scheduler and a big buffer has built up there, so 
the window is unnecessarily big; and some node is 
fuming out of space. 



3) If BIG_JNPUT=0 and 
SOME_BACKLOG=0 and SPACE_CRUNCH = 1 
(step 1006), then reduce the window size. 

The virtual circuit has a light offered load, so it 
does not need a big window to carry the load; and 
some node is running out of space. 

4) In all other cases (step 1006), the present 
window size is appropriate. 

The receiving router then transmits the Phase 
1 control message to the first switching node on 
the return path (step 1012). The response message 
contains the fields ORIGIN = 1, PHASE =0. 
WlNDOW_SlZE, where the last field is a binary 
encoring of the recommended window size. Each 
node controfler 212 on the return path looks at the 
control message and takes the action shown in Rg. 
11. If an increased allocation is requested (step 
1100), the node makes the allocation if it can (step 
1102). If it cannot make the requested allocation, it 
makes whatever allocation it can make, the mini- 
mum being the present buffer size, and writes the 
allocation it has made into the WINDOW__SIZE 
field (step 1104). The node then transmits the 
control message to the next node on the return 
path (step 1106). If the request is for a decreased 
allocation, the node does not make the decrease 
yet but it passes the WINDOW__SIZE field along 
unchanged. 

When the transmitting router receives the 
Phase 1 response message (step 804),the 
WINDOW_SIZE field indicates the window that the 
virtual circuit is going to have. If there is an in- 
crease over the present window size, it is available 
immediately. If there is a decrease, the transmitting 
router waits for the amount of unacknowledged 
data in the virtual circuit to drain down to the new 
window size, as shown in Rg. 8 at step 806. Then 
it transmits a Phase 2 control message with the 
fields ORIGIN =0, PHASE=1, W1ND0W_SIZE 
(Step 810). Node controllers receiving thte mes- 
sage take the action shown in Rg. 12. They adjust 
their buffer allocations downward, if necessary, to 
the value of WINDOW__SIZE (step 1200). and pass 
the control message along unchanged (step 1202). 
The receiving router returns a Phase 2 response 
message with the fields ORIGIN =1. PHASE =1. 
WlNDOW_SlZE The switching nodes simply pass 
this message along, since its only purpose Is to 
notify the transmitting router that a consistent set of 
buffer allocations exists along the entire virtual cir- 
cuit 

After completing Phase 2, the transmitting rout- 
er waits for a while, as shown at step 616 in Rg. 8, 
before beginning Phase 1 again. Rrst ft wafts until 
either a window's worth jof data has been transmit- 
ted since the end of Phase 2 or a certain period of 
time D, such as 10 to 100 round-trip times, has 
elapsed since the end of Phase 2* whichever 
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comes first Then, if the present window size Is 
greater than the minimum window size B 0 (step 
818) or if BIG_JNPUT = 1 (step 800), Phase 1 
begins immediately; otherwise. Phase 1 begins as 
soon as BIG_JNPUT = 1. A newly opened virtual 6 
circuit whose Initial window size and buffer alloca- 
tions are Bo. should begin Phase 1 as soon as 
BIG INPUT = 1, if ever. 



Claims 

1. A method of controlling the congestion of data 
cells in a network having a plurality of switch- 
ing nodes and a plurality of incoming virtual is 
circuits at each node, said method comprising 

the steps of assigning &i initial ceil buffer to 
each virtual circuit at each node, 

storing Incoming cells for virtual circuits In 
their respective buffers and removing ceils 20 
from the buffers for forward routing, character- 
izedby 

dynamically allocating buffer space selec- 
tively to ones of the incoming circuits at a 
node in response to signals requesting in- 25 
creased or decreased data window sizes, re- 
spectively. 

2. The method of claim 1 wherein the step of 
assigning an initial buffer further comprises so 

assigning an initial buffer of predetermined 
size to each virtual circuit 

a The method of claim 2 wherein the predeter- 
mined size of the initial buffer is less than the as 
size of a full data window, wherein a fun data 
window is defined as the product of the maxi- 
mum transmission bet rate of the virtual circuit 
multiplied by a nominal factor representing 
round trip propagation time in the network *o 

4. The method of claim 1 wherein the step of 
dynamically allocating buffer space to a virtual 
circuit further comprises allocating a full data 
window in response to a signal requesting a 45 
larger buffer space, wherein a full data window 

is defined as the product of the maximum 
transmission bit rate of the virtual circuit multi- 
plied by a nominal factor representing round 
trip propagation time in the network. so 

5. The method of claim 4 further comprising the 
step of requesting a larger buffer space based 
on the amount of data waiting to be sent for 

the sakJ virtual circuit at the cell source. 55 

6. The method of claim 4 wherein the step of 
dynamically a ll oc atin g a fufl data window fur- 



ther comprises determining if sufficient free 
buffer space exists at each node of the virtual 
circuit to perform the allocation and denying 
the request otherwise. 

7. The method of claim 1 wherein the step of 
dynamically allocating buffer space further 
comprises allocating space to a virtual circuit 
in one or more blocks of fixed size. 

a The method of claim 1 wherein the step of 
allocating buffer space further comprises al- 
locating space to a virtual circuit in blocks of 
variable size. 

9. The method of claim 7 or claim 8 further 
comprising the step of determining the size of 
a block to be allocated at each node of a 
virtual circuit based on the amount of data 
waiting to be sent for the said virtual circuit at 
the ceil source. 

10. The method of claim 9 wherein the step of 
dynamically allocating buffer space in re- 
sponse to a request for a larger buffer further 
comprises determining if sufficient free buffer 
space exists at each node of the virtual circuit 
to perform the allocation and denying the re- 
quest otherwise. 

11. The method of claim 9 further comprising the 
step or aeterrnmtng tne size or a djock to De 
allocated based on the amount of packet data 
already buffered for the said virtual circuit at 
each said node. 

12. The method of claim 1 1 further comprising the 
step of determining the size of a block to be 
allocated at each node of a virtual circuit based 
on the amount of free buffer space at each 
said node. 

13. The method of claim 11 wherein the step of 
dynamically allocating buffer space In re- 
sponse to a request for a larger buffer further 
comprises determi n ing if sufficient free buffer 
space exists at each node of the virtual circuit 
to perform the allocation and denying the re- 
quest otherwise. 

14. The method of claim 13 wherein the step of 
determining if sufficient free buffer space ex- 
ists at each node further comprises 

transmitting a control message along a vir- 
tual circuit from the first node in the circuit to 
the last node In the circuit 

11 i.ttln r. 1 if 1 rILiji Int-TL |I UL juiiil^J 

wrrong information into tne control mes- 
sage as it passes through each node deecrib- 
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ing the amount of free buffer spare that can be 
allocated at the node, and 

selecting the amount of buffer space as- 
signed to the virtual circuit at each node to be 
equal to the smallest amount available at any 
node of the virtual circuit based on the final 
results in the control message. 

15. The method of claim 13 wherein the step of 
determining if sufficient free buffer space ex- 
ists at each node further comprises 

transmitting control message along a vir- 
tual circuit from the first node in the circuit to 
the last node in the circuit, the control mes- 
sage containing Information representing 
whether a large or small amount of data is 
buffered at the initial node for the virtual droit 
and information representing the availability of 
free buffer space at the initial node, 

overwriting sakJ information in the control 
message with new information as it passes 
through each node if the new information at a 
node is more restrictive, and 

selecting the amount of buffer space as- 
signed to the virtual circuit at each node based 
on the find results in the control message. 

16. The method of claim 14 wherein the step of 
determining if sufficient free buffer space ex- 
ists at each node further comprises 

performing the selecting step at the final 
node, and 

returning a second control message from 
the last node through each node of the virtual 
circuit, and 

adjusting the allocation at each node in 
response to the second control message. 

17. The method of claim 14 wherein the step of 
determining if sufficient free buffer space ex- 
ists at each node further comprises 

returning the control message from the last 
node in the virtual circuit to the first node in 
the virtual circuit, 

performing the selecting step at the initial 
node, 

transmitting a second control message 
from the first node to the last node to perform 
the allocation. 

1& The method of claim 2 wherein the size of the 
initial cell buffer is equal to the size of a full 
data window dMded by the square root of the 
maximum number of virtual droits that can 
simultaneously exist in any node. 

19l The method of daim 1 or claim 2 or clam 3 or 
daim 4 or claim 7 or claim 8 further compris- 



ing the step of discarding data for a virtual 
circuit during buffer overflow for the said virtual 
circuit 

s 2a The method of claim 1 or claim 2 or claim 3 or 
claim 4 or daim 7 or claim 8 further compris- 
ing the step of requesting a reduction in the 
allocated buffer space for the virtual circuit 
after a prior increase of the buffer space above 

io the initial buffer space based on the amount of 
data waiting to be sent for the said virtual 
circuit at the cell source. 

21. The method of daim 20 wherein the step of 
is requesting a reduction in the allocated buffer 

space for the virtual circuit after a prior in- 
crease of the buffer space above the initial 
buffer space is further based on the amount of 
data already buffered for the said virtual droit 
20 at each said node. 

22. The method of claim 21 wherein the step of 
requesting a reduction in the allocated buffer 
space for the virtual circuit after a prior fn- 

25 crease of the buffer space above the initial 
buffer space is further based on the amount of 
free buffer space at each said node. 

23L The method of claim 1 wherein the stop of 
90 dynamically allocating buffer space further 
comprises 

transmitting a control message along a vir- 
tual droit from the first node in the circuit to 
the last node in the droit, 

35 writing information into the control mes- 

sage as it passes through each node describ- 
ing the amount of free buffer space that can be 
allocated at the node, and 

selecting the amount of buffer space as- 

40 signed to the virtual circuit at each node to be 
equal to the smallest amount available at any 
node of the virtual circuit based on the final 
results in the control message. 

46 24. The method of claim 23 wherein the step of 
dynamically allocating buffer space further 
comprises 

performing the selecting step at the final 
node, and 

so returning a second control message from 

the last node through each node of the virtual 
droit, and 

adjusting the allocation at each node in 
response to the second control message. 

55 

2& The method of daim 24 wherein the stop of 
determining if sufficient free buffer space ex- 
Ssts at each node farther comp ri ses 
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returning the control message from the labr 
node In the virtual circuit to the first node in 
the virtual droit 

performing the selecting step at the initial 
node. s 

transmitting a second control message 
from the first node to the last node to perform 
the allocation. 
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